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Abstract. Basically, in (one-player) war Real Time Strategy (wRTS) 
games a human player controls, in real time, an army consisting of a num- 
ber of soldiers and her aim is to destroy the opponent's assets where the 
opponent is a virtual (i.e., non-human player controlled) player that usu- 
ally consists of a pre-programmed decision-making script. These scripts 
have usually associated some well-known problems (e.g., predictability, 
non-rationality, repetitive behaviors, and sensation of artificial stupidity 
among others). This paper describes a method for the automatic gener- 
ation of virtual players that adapt to the player skills; this is done by 
building initially a model of the player behavior in real time during the 
game, and further evolving the virtual player via this model in-between 
two games. The paper also shows preliminary results obtained on a one- 
player wRTS game constructed specifically for experimentation. 



1 Introduction and related work 

In an era in which computing power has boosted the graphical quality of 
videogames, players have turned their attention to other aspects of the game. In 
particular, they mostly request opponents exhibiting intelligent behavior. How- 
ever, intelligence does not generally equate to playing proficiency but rather to 
interesting behaviors [1]. This issue is specifically relevant in real-time strategy 
(RTS) games that often employ two kinds of artificial intelligence (AI) [2]: one 
represented by a virtual player (VP, or non-player character - NPC) making 
decisions on a set of units (i.e., warriors, machines, etc.), and another one cor- 
responding to the small units (usually with little or no intelligence). The design 
of these AIs is a complex task, and the reality in the industry is that in most of 
the RTS games, the NPC is basically controlled by a fixed script that has been 
previously programmed based on the experience of the designer/programmer. 
This script often comprises hundreds of rules, in the form if the game is in 
state S then Unit U should execute action A, to control the behavior of 
the components (i.e., units) under its control. This arises known problems: for 
instance, for players, the opponent behavior guided by fixed scripts can become 
predictable for the experienced player. Also, for AI programmers, the design of 



virtual players (VPs) can be frustrating because the games are becoming more 
and more complex with hundreds of different situations/states and therefore it 
is not easy to predict all the possible situations that could potentially happen 
and even more difficult to decide which is the most appropriate actions to take 
in such situations. As consequence, many RTS games contain 'holes' in the sense 
that the game stagnates or behaves incorrectly under very specific conditions 
(these problems rely on the category of 'artificial stupidity' [1]). Thus the reality 
of the simulation is drastically reduced and so too the interest of the player; 

In addition, there are very interesting research problems in developing AI for 
Real-Time Strategy (RTS) games including planning in an uncertain world with 
incomplete information, learning, opponent modeling, and spatial and temporal 
reasoning [3]. This AI design is usually very hard due to the complexity of 
the search space (states describe large playing scenarios with hundreds of units 
simultaneously acting). Particular problems are caused by the large search spaces 
(environments consisting of many thousands of possible positions for each of 
hundreds, possibly thousands, of units) and the parallel nature of the problem - 
unlike traditional games, any number of moves may be made simultaneously [4]. 
Qualitative spatial reasoning (QSR) techniques can be used to reduce complex 
spatial states (e.g., using abstract representations of the space [5]). Regarding 
evolutionary techniques, a number of biologically-inspired algorithms and multi- 
agent based methods have already been applied to handle many of the mentioned 
problems in the implementation of RTS games [6-13]. 

Even in the case of designing a very good script for VPs, the designer has to 
confront another well known problem: the VP behavior is usually fixed and rarely 
adapts to the level (i.e., skills) of the player. In fact the player can lose interest 
in the game because either she is able to beat all the opponents in each level, or 
the virtual player always beat her. In this sense, the design of interesting (e.g., 
non-predictable) NPCs is not the only challenge though. It also has to adapt to 
the human player, since she might lose interest in the game otherwise (i.e., if the 
NPCs arc too easy or too hard to beat). 

There are many benefits attempting to build adaptive learning AI systems 
which may exist at multiple levels of the game hierarchy, and which evolve over 
time. This paper precisely deals with the issue of generating behaviors for the 
virtual player that evolve in accordance with the player's increasing abilities. 
This behavior emerges according to the player skill, and this emergent feature 
can make a RTS game more entertaining and less predictable in the sense that 
emergent behavior is not explicitly programmed but simply happens [14]. The 
attainment of adjustable and emergent virtual players consists here of a process 
of two stages that are iteratively executed in sequence: (1) a behavior model 
of the human player is created in real time during the execution of a game, 
and further the virtual player is evolved off-line (i.e., in between two games) 
via evolutionary algorithms till a state that it can compete with the player (not 
necessarily to beat her but to keep her interest in the game) . This approach has 
been applied on a RTS game constructed specifically for experimentation, and 
we report here our experience. 



2 The Game 



Here we describe the basic elements that compose our wRTS game. 

Scenario. It will also be called indistinctly region or world and consists of a 
two-dimensional non-toroidal heterogeneous hostile and dynamic grid- world. The 
world is heterogeneous because the terrain is not uniform, hostile because there 
exist a virtual army whose mission is to destroy the human player-controlled 
army, and dynamic because the game conditions change depending on the actions 
taken by units of both armies. Regarding the heterogeneity of the terrain, each 
grid in the region can have one of the three following values: passable (each uni 
can traverse this grid), impassable (no unit can traverse it), and semi-impassable 
(there is a penalization of 1 point of energy - see below -to traverse it). 

Army. There are two armies (also called indistinctly teams, represented by 
spiders and ladybirds) with a number of units (also called indistinctly soldiers or 
agents) and a flag to be defended. One army is controlled by the human player 
whereas the other one is guided by some kind of artificial intelligence (AI). 

Given a unit u placed in a position (x u ,y u ) in the grid, its visual range (VR„) 
embraces any position (x, y) that is placed to a maximum distance cj>, that is to 
say, VR U = {(x,y) | y/ (x u - x) 2 + (y u - y) 2 < </>}. Initially, each agent only 
knows the grids that belongs to its visual range. In fact, during the game the 
scenario is not completely known for an army and only those regions that were 
already visualized by its agents are known. The global information that each 
army (independently if this is controlled by the AI or the player) has is the sum 
of all the information of their constituent soldiers. Note that (human or virtual) 
players only know the rival flag position if this has been detected in a grid by 
some of its agents previously. Also, each unit interacts with the environment by 
executing a number of actions (see below) and has certain level of health and 
energy that decreases with these interactions. The initial values for the health 
and energy of each soldier are 100 and 1000 respectively. 

Fights. Full body combat can be executed at the soldier level by assigning a 
random value (in the interval [0,1]) to each unit involved in a fight with a rival 
soldier; the energy level of the unit with lowest value decreases 1 unit. A unit 
dies when its energy is zero. All the combats between soldiers are executed in 
sequence in one turn of the game. 

Decision-making. Along the game, the role of players is to make decisions 
with respect to the actions that the units of their respective associated armies 
should take. More specifically, both the virtual and human player can select a 
set of agents and further assign it a specific order that basically consists of a 
common action that each unit in this set has to execute. 

Actions. The game is executed in turns; every turn, each soldier executes the 
last order received by its player. Six actions are possible in this game: 

— Move forward enemy: if applied to unit u, then this has to move towards the 
closer rival soldier according to VR U , otherwise (i.e., if no rival soldier exists 
in VR„) agent u has to move in direction to the region where there are more 
enemies according to the global information of the army. 



— Group/Run away: the soldier has to group with other team mates placed 
inside its visual range; if there are not mates in this range, then the agent 
has to move towards the position where its flag is placed. 

— Move forward objective: the agent advances towards the opponent's flag 
position if this is known, otherwise it moves randomly. 

— No operation: execute no operation. 

— Explore: the soldier moves towards a non-explored-yet region of the map. 

— Protect flag: The soldier has to move towards the position of its own flag; if 
it is already near to it, unit just should scout the zone. 

Perceptions. Each unit knows its specific situation in the game from the 
perceptions that receives in its visual range. In our game, the agent state is 
determined by its health (measured in three range values: low, medium and 
high), and the following boolean perceptions: 

1. Advantage state? (S a ): "the unit is into a handicap situation". This percep- 
tion is true if, according to the information in its visual range, the number 
of mate soldiers is higher than the number of rival units, and false otherwise. 

2. Under attack? (U a ): this perception is true of the soldier is involved in a full 
body combat with another opponent unit, and false otherwise. 

3. Objective visible? (O v ): the opponent's flag is visible in its visual range. 

Game objective. The army that first captures the rival flag is the winner; 
if this situation is never reached after a number of turns, then the winner is 
the player whose army inflicts a higher damage to its rival army and where the 
damage is measured as number of dead units. 

2.1 Notes on specific issues 

Observe that 'fighting' is not a specific action to be executed by the army units. 
This should not be surprising because 'fighting' is not usually controlled by 
players in standard RTS games as this action should often is executed at the 
unit level and not at the army level; in our game this action is automatically 
executed when a unit and a rival soldier meet in any grid of the scenario. 

Also, note that five of the six possible actions that units can execute require 
to make movements in the game world. This basically means that some kind 
of pathfinding should be processed to do the movement realistic. Classical al- 
gorithms such as A* were not good candidates for this task since, as already 
indicated, most of the scenario is not known for the army units. A practical so- 
lution was to let units make a good-enough movement depending on its scenario 
knowledge; several cases were covered: Concave obstacles: the unit moves by 
considering different angles with respect to its position in the scenario; Convex 
obstacles: the unit moves by following the obstacle contour in a specific direction; 
and Bridges and narrow passages: a simple ant-colony algorithm was executed. 

Due to space limitations we will not explain them in details as this is beyond 
the objective of this paper. In any case, Figures 1(a), (b) and (c) show illustrative 
examples of the cases, Figure 1(d) illustrates the classical behavior of our ant 



colony algorithms, and Figure 1(e) displays a screenshot of our wRTS game 1 
titled aracnia. Note also that a description of the technical issues of the game 
(e.g., implementation of basic actions, graphical considerations, input/output 
interface, scene rendering and memory management, among others) is beyond 
the scope of this paper. In the following we will focus on the process of designing 
virtual players. 




(a) Concave obstacles (b) Convex obstacles (c) bridges and passages 




Game 

Fig. 1. (a)-(c) Movement examples; (d) illustration of classical execution of ant-colony 
algorithm used for pathfinding; (e) game screenshot (black cells represent non-explored 
regions yet, blue cells impassable grids, and green cells passable regions in the scenario. 

3 Virtual player design 

Here we describe our proposal to design adaptive virtual players in our wRTS. 
3.1 General issues 

Our aim is to generate controllers governing automatically (i.e., without hu- 
man intervention) the behavior of an entire team (i.e., a set of autonomous 
agents/soldiers). An army controller can be viewed as a set of rules that will 
control the reactions of its agents according to the current state of play. De- 
pending on its situation, the agent executes a particular action that may modify 



1 A prototype version of this game can be unloaded from 
http : //www. Ice .uma.es/~afdez/aracnia. 



Algorithm 1: PMEA (VP) 

1 VP 4— RBP; // Rule-based expert system 

2 for i G do 

3 I PlaycrModcl <— PlayGameOn(VP); // Player modeling (PM) 

4 | VP <- EA(PlayorModol,VP); // Evolutionary optimization CEA) 

5 end for 

6 return VP; 



the state of the agent itself. The global team strategy emerges as the result of 
the sum of the specific behavior of all the agents. 

However, the definition of specific strategies for each agent is complex, re- 
quires a profound knowledge of the required behavior of each agent, and results 
in predictable behavior. Moreover, devising independent controllers for each unit 
in a RTS game is costly; in fact this is not realistic as requires a high computation 
effort that surely decreases the quality of the real-time rendering of the game. 
One solution consists of designing global team strategies; however, again this is 
very complex as the programmer has to cope with too many possibilities that 
arise from the different interactions among all the agents of both armies. We have 
opted by a more realistic process: all the soldiers will be managed by the same 
controller, and in any instant of the game, the team behavior will be the result 
of summing all the action taken by each of its constituent units. This means that 
we have to produce just one controller to devise a global team strategy. Note 
however that this does not mean all the agents execute the same action because 
the action to be executed by a soldier will depend on its particular situation in 
the match. 

3.2 A two phases process 

The procedure described in the following has the aim of creating a virtual player 
whose behavior auto-evolves according to the skills that human player exhibits 
along the game. Algorithm 1 displays the general schema of the process: initially 
(in the first game), the virtual player (VP) is constructed as an expert system 
(i.e., a rule-based prototype (RBP)) that comprises tens of rules trying to cover 
all the possible situations in which an agent might be involved. This system was 
the result of a number of trials designed from the experiences of the authors 
playing wRTS games. Then, let be {1, . . . , h} henceforth, and assume the 
human player will play a number of p games, the procedure consists of two phases 
that are sequentially executed. Firstly, a player modeling phase is conducted; 
this step is described in Section 3.3 and basically consists of building a model 
of the behavior that human player exhibits during the game (i.e., on-line). The 
second phase, described in Section 3.4, is devoted to construct the virtual player 
by evolving a strategy controlling the functioning of a virtual army unit via 
an evolutionary algorithm (EA). In the following we describe in details both 
phases. Firstly, we discuss how the virtual player is internally encoded as this is 
important to understand the functioning of the whole schema. 

Representation: the virtual player contains a NPC army strategy, and its 
internal encoding corresponds with both the rule-based expert system and any 
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Fig. 2. Example of an arbitrary encoding (left), and extended answer matrix (right). 



individual in the EA population. In the following we will refer to 'individual' 
in a general sense. Every individual represents a team strategy, that is to say, 
a common set of actions that each army unit has to execute under its specific 
situation in the environment. An individual is represented as a vector v of k cells 
where k is the number of different situations in which the agent can be in the 
game, and v[i] (for < i < k) contains the action to be taken in situation i. In 
other words, assuming m perceptions, where the perception pj (for < j < m— 1) 
can have kj possible values, then k = fcrj * k\ * . . . * fc m _i and the cell: 

u[e m -i+e TO _ 2 *A; m -i+e TO _3*(fc m _i*fe m _2)+e m _4*(A; rr j_i*A; m _2*fc m -3)+ ] 

contains the action to be executed by a specific unit when its perceptions 
Po,Pi, ■ ■ ■ ,Pm-i have the values erj, ei, . . . , e m _i respectively. As indicated in 
Section 2, the state of a unit in our wRTS game is determined by three boolean 
perceptions and its energy level (with 3 different values) so that the encoding of 
an individual consists of a vector with 24 genes; this vector will be called answer 
matrix, and each cell in the vector will contain an action. Remember that 6 pos- 
sible actions can be executed in our wRTS and thus the search space (i.e., the 
number of different strategies that can be generated from this representation) 
is 6 24 . Figure 2(left) displays an example of a possible encoding. The optimal 
solution (if it exists) would be that strategy which always select the best action 
to be executed for the agents under all possible environmental conditions. In 
fact, this vast search space makes this problem impracticable for exact methods 
and justifies the use of evolutionary algorithms. 

3.3 On-line phase: User modeling 

The player behavior will be modeled as a ruled-based strategy encoded as ex- 
plained above. This process requires collecting, during the execution of the game 
(i.e., in real time) the actions that human player executes, recording additionally 
the specific conditions under which these are performed. At the end of this pro- 
cess we generate an extended answer matrix (v + ) that is an answer matrix (i.e., 



Algorithm 2: EA(PModel, VP)) // PModel = Player Model, VP = 
Virtual Player 

1 for i e ^popsize-l d° 

2 | popi <— Random-Solution(); 

3 end for 

4 pOPp p S i zo ^VP; 

s i <- 0; 

6 while i < MaxGenerations do 

7 Rank-Population (pop); // sort population according to fitness 

8 parenti -(—Select (pop); // roulette wheel 

9 parent? ■<— Select (pop); 

10 if Rand[0, 1] < px then // recombination is done 

11 \ (childi , child?) <— Recombine (parenti, parent?); 

12 else 

13 j (childi, child?) ^ (par enti, parent?); 

14 end if 

15 childi 4— Mutate (childi, pm); II Pm = mutation probability 

16 child? 4— Mutate (child?, pm); 

17 fitnessi <— PlayGameOff(P M odel , childi); 

18 fitness? <— PfayG ameOff(P M odel, child?); 

19 pop <— REPLACE(pop, childi, child?); // (popsize + 2) replacement 

20 i <— i + 1; 

21 end while 

22 return best solution in pop; 



an individual encoding v) where each cell v + [i] (for < i < k) represents now 
a vector of 6 positions (one per action) and v + [z][a] (for some action a € [1,6]) 
contains the probability that human player executes action a under the environ- 
ment conditions (i.e., the state) associated to cell v[i]. Figure 2(right) displays 
an example that shows the probability of executing each of the 6 possible actions 
in a specific situation of the unit (i.e., the soldier has medium energy, is in an 
advantage state, is not suffering an attack, and knows where the opponent flag is 
placed). This extended answer matrix is finally used to design the virtual player 
as follows: VP{i] = argmax a ^[i^{v + [i] [a]}, for all possible situations i. 

3.4 Off-line phase: evolutionary optimization 

Algorithm 2 shows the basic schema of our EA. The initial population is ran- 
domly generated except one individual that is charged with the virtual player 
(lines 1-4). Then a classical process of evolution (lines 6-21) is performed and the 
best solution in the population is finally returned (draws are broken randomly). 

Evaluating the fitness of an individual x requires the simulation (off-line) of 
the game between the player model and the virtual player strategy encoded in 
x. The fitness function depends on the statistical data collected at the end of 
the simulation. The higher the number of statistical data to be collected, the 
higher the computational cost will be. A priori, a good policy is to consider 
a limited number of data. Four data were used in our experiments during off- 
line evaluation: A: Number of deaths in the human player army; B: number of 
deaths in virtual player army; C: number of movements; and D: victory degree 
(i.e., 1 if virtual player wins and 2 otherwise). Fitness function was then defined 
as fitness(x) = 1000 p^~" B - > ; higher the fitness value, better the strategy. This 
fitness was coded to evolve towards aggressive solutions. 



4 Experimental analysis 



The experiments were performed using two algorithms: our initial expert system 
(RBP), and the algorithm PMEA (i.e., player modeling + EA) presented in 
Algorithm 1. As to the PMEA, the EA uses popsize = 50, px = -7, pu = -01, 
and MaxGenerations = 125; mutation is executed as usual at the gene level 
by changing an action to any other action randomly chosen. Three different 
scenarios where created for experimentation: (1) A map with size 50 x 50 grids, 
48 agents in VP army, and 32 soldiers in the human player (HP) team; (2) a 
map 54 x 46, with 43 VP soldiers, and 43 HP units; and (3) a map 50 x 28, 
with 48 VP soldiers, and 53 HP units. Algorithm 1 was executed for a value of 
p = 20 (i.e., 20 different games were sequentially played), and the RBP was also 
executed 20 times; Table 1 shows the results obtained. 

Table 1. Results: V~P win = number of virtual player's victories, HP m ; n = number of 
human player's victories, BPdeath = average number of deaths in the HP army, VP death 
— average number of deaths in the VP army, mov = average number of movements, 
and time — average time (minutes) dedicated per game 
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Even though in two of the three scenarios PMEA behaves better than RBP, 
note that no significant differences are shown; this is however an expected re- 
sult as we have considered just one player what means that the player models 
obtained in-between two games are likely similar and thus their corresponding 
virtual players also are. In any case, this demonstrates that our approach is 
feasible as it produces virtual players comparable - and sometimes better - to 
specific and specialized pre-programmed scripts. 

5 Conclusions 

We have described an algorithm to design automatically strategies exhibiting 
emergent behaviors that adapt to the user skills in a one-player war real time 
strategy game (wRTS); to do so, a model to mimic how the human player acts 
in the game is first constructed during the game, and further a strategy for a 
virtual player is evolved (in between games) via an evolutionary algorithm. 

Our proposal was compared with an expert system designed specifically for 
the game. Whereas no significance differences have been highlighted in the ex- 
periments, we make note that our approach has evident advantages compared to 
classical manufactured scripts (i.e., expert systems) used in videogame industry: 



for instance, it avoids the predictability of actions to be executed by the virtual 
player and thus guarantees to maintain the interest of the player This is specially 
interesting when the game involves more than one player as our approach would 
allow to construct virtual players adapted particularly to each of the human 
players (and this cannot be obtained with a pre-programmed script). 

Further research will cope with multi-player games and thus multi-objective 
evolutionary programming techniques should be considered. 
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